{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using TensorFlow Scripts in SageMaker - Quickstart\n",
    "\n",
    "Starting with TensorFlow version 1.11, you can use SageMaker's TensorFlow containers to train TensorFlow scripts the same way you would train outside SageMaker. This feature is named **Script Mode**. \n",
    "\n",
    "This example uses \n",
    "[Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow](https://github.com/sherjilozair/char-rnn-tensorflow). \n",
    "You can use the same technique for other scripts or repositories, including \n",
    "[TensorFlow Model Zoo](https://github.com/tensorflow/models) and \n",
    "[TensorFlow benchmark scripts](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Test locally using SageMaker Python SDK TensorFlow Estimator"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can use the SageMaker Python SDK [`TensorFlow`](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/README.rst#training-with-tensorflow) estimator to easily train locally and in SageMaker. \n",
    "\n",
    "Let's start by setting the training script arguments `--num_epochs` and `--data_dir` as hyperparameters. Remember that we don't need to provide `--model_dir`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import sagemaker\n",
    "from sagemaker import get_execution_role\n",
    "from sagemaker.tensorflow import TensorFlow\n",
    "\n",
    "role = get_execution_role()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hyperparameters = {'task_name': 'ner', 'do_train': 'true', 'do_eval': 'true', 'do_export': 'true', \n",
    "                   'data_dir': '/opt/ml/input/data/training', 'output_dir': '/opt/ml/output', 'model_dir': '/opt/ml/model', \n",
    "                   'vocab_file': './albert_config/vocab.txt', 'bert_config_file': './albert_base_zh/albert_config_base.json', \n",
    "                   'max_seq_length': 128, 'train_batch_size': 64, 'learning_rate': 2e-5, 'num_train_epochs': 1}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook shows how to use the SageMaker Python SDK to run your code in a local container before deploying to SageMaker's managed training or hosting environments. Just change your estimator's train_instance_type to local or local_gpu. For more information, see: https://github.com/aws/sagemaker-python-sdk#local-mode.\n",
    "\n",
    "In order to use this feature you'll need to install docker-compose (and nvidia-docker if training with a GPU). Running following script will install docker-compose or nvidia-docker-compose and configure the notebook environment for you.\n",
    "\n",
    "Note, you can only run a single local notebook at a time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!/bin/bash ./utils/setup.sh"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To train locally, you set `train_instance_type` to [local](https://github.com/aws/sagemaker-python-sdk#local-mode):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import subprocess\n",
    "\n",
    "train_instance_type='local'\n",
    "\n",
    "if subprocess.call('nvidia-smi') == 0:\n",
    "    ## Set type to GPU if one is present\n",
    "    train_instance_type = 'local_gpu'\n",
    "    \n",
    "print(\"Train instance type = \" + train_instance_type)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We create the `TensorFlow` Estimator, passing the `git_config` argument and the flag `script_mode=True`. Note that we are using Git integration here, so `source_dir` should be a relative path inside the Git repo; otherwise it should be a relative or absolute local path. the `Tensorflow` Estimator is created as following: \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "estimator = TensorFlow(entry_point='albert_ner.py',\n",
    "                       source_dir='.',\n",
    "                       train_instance_type=train_instance_type,\n",
    "                       train_instance_count=1,\n",
    "                       hyperparameters=hyperparameters,\n",
    "                       role=role,\n",
    "                       framework_version='1.15.2',\n",
    "                       py_version='py3',\n",
    "                       script_mode=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To start a training job, we call `estimator.fit(inputs)`, where inputs is a dictionary where the keys, named **channels**, \n",
    "have values pointing to the data location. `estimator.fit(inputs)` downloads the TensorFlow container with TensorFlow Python 3, CPU version, locally and simulates a SageMaker training job. \n",
    "When training starts, the TensorFlow container executes **train.py**, passing `hyperparameters` and `model_dir` as script arguments, executing the example as follows:\n",
    "```bash\n",
    "python -m train --num-epochs 1 --data_dir /opt/ml/input/data/training --model_dir /opt/ml/model\n",
    "```\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "inputs = {'training': f'file:///home/ec2-user/SageMaker/albert-chinese-ner/data/'}\n",
    "\n",
    "estimator.fit(inputs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor = estimator.deploy(1, train_instance_type)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's explain the values of `--data_dir` and `--model_dir` with more details:\n",
    "\n",
    "- **/opt/ml/input/data/training** is the directory inside the container where the training data is downloaded. The data is downloaded to this folder because `training` is the channel name defined in ```estimator.fit({'training': inputs})```. See [training data](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-running-container-trainingdata) for more information. \n",
    "\n",
    "- **/opt/ml/model** use this directory to save models, checkpoints, or any other data. Any data saved in this folder is saved in the S3 bucket defined for training. See [model data](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-envvariables) for more information.\n",
    "\n",
    "### Reading additional information from the container\n",
    "\n",
    "Often, a user script needs additional information from the container that is not available in ```hyperparameters```.\n",
    "SageMaker containers write this information as **environment variables** that are available inside the script.\n",
    "\n",
    "For example, the example above can read information about the `training` channel provided in the training job request by adding the environment variable `SM_CHANNEL_TRAINING` as the default value for the `--data_dir` argument:\n",
    "\n",
    "```python\n",
    "if __name__ == '__main__':\n",
    "  parser = argparse.ArgumentParser()\n",
    "  # reads input channels training and testing from the environment variables\n",
    "  parser.add_argument('--data_dir', type=str, default=os.environ['SM_CHANNEL_TRAINING'])\n",
    "```\n",
    "\n",
    "Script mode displays the list of available environment variables in the training logs. You can find the [entire list here](https://github.com/aws/sagemaker-containers/blob/master/README.rst#list-of-provided-environment-variables-by-sagemaker-containers)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Training in SageMaker"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After you test the training job locally, upload the dataset to an S3 bucket so SageMaker can access the data during training:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "\n",
    "inputs = sagemaker.Session().upload_data(path='/home/ec2-user/SageMaker/albert-chinese-ner/data', key_prefix='DEMO-tensorflow-albert-chinese-ner')\n",
    "print(inputs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The returned variable inputs above is a string with a S3 location which SageMaker Tranining has permissions\n",
    "to read data from."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To train in SageMaker:\n",
    "- change the estimator argument `train_instance_type` to any SageMaker ml instance available for training.\n",
    "- set the `training` channel to a S3 location."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "estimator = TensorFlow(entry_point='albert_ner.py',\n",
    "                       source_dir='.',\n",
    "                       instance_type='ml.p3.2xlarge', # Executes training in a ml.p2.xlarge instance\n",
    "                       instance_count=1,\n",
    "                       hyperparameters=hyperparameters,\n",
    "                       role=role,\n",
    "                       framework_version='1.15.2',\n",
    "                       py_version='py3',\n",
    "                       script_mode=True)\n",
    "\n",
    "estimator.fit({'training': inputs})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "instance_type = 'ml.m5.xlarge'\n",
    "predictor = estimator.deploy(1, instance_type)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Git Support"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "git_config = {'repo': 'https://github.com/whn09/albert-chinese-ner.git', 'branch': 'master'}\n",
    "\n",
    "# TODO You need to add codes to albert_ner.py to download and extract data\n",
    "\n",
    "estimator = TensorFlow(entry_point='albert_ner.py',\n",
    "                       source_dir='.',\n",
    "                       git_config=git_config,\n",
    "                       instance_type='ml.p3.2xlarge', # Executes training in a ml.p2.xlarge instance\n",
    "                       instance_count=1,\n",
    "                       hyperparameters=hyperparameters,\n",
    "                       role=role,\n",
    "                       framework_version='1.15.2',\n",
    "                       py_version='py3',\n",
    "                       script_mode=True)\n",
    "\n",
    "estimator.fit({'training': inputs})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_tensorflow_p36",
   "language": "python",
   "name": "conda_tensorflow_p36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
